Human-coding and computational text analysis are more powerful when combined in an interactive workflow. I offer a suite of exact methods that can increase the power of common hand-coding tasks by several orders of magnitude. Human coding can both inform and be aided by rule-based information extraction, iteratively structuring queries on unstructured text.
Applying this method to public comments on U.S. Federal Agency rules, a hand-coded sample of 10,894 hand-coded comments yields 41 million as-good-as-hand-coded comments regarding both the organizations that mobilized them and the extent to which policy changed in the direction they sought. This large sample enables new analyses of lobbying coalitions, social movements, and policy change.
Workflow: googlesheets4allows analysis and improving data in real time. For example, in Figure 1:
Figure 1: Example Coded Comments in a Google Sheet
| Entity | Pattern |
|---|---|
| 3M Co | 3M Co|3M Cogent|3M Health Information Systems|Ceradyne|Cogent Systems|Hybrivet Systems |
| Teamsters Union | Brotherhood of Locomotive Engineers & Trainmen|Brotherhood of Maint of Way Employ Div|New England Teamsters & Trucking Pension|Teamsters Airline Express Delivery Div|Teamsters Local 357|Teamsters Union|Western Conf of Teamsters Pension Trust |
Figure 2: Iteratively build regex tables. For example, the legislators package adds legisaltor name varients (e.g., “AOC”) to standard legislator names
Iteratively linking comments to the organizations that wrote or mobilized them (and thus strings to identify similar documents), I find that a small number of professional advocacy organizations mobilize the vast majority of comments. The top 100 organizations mobilized 43,938,811 comments. The top ten organizations mobilized 25,947,612.
| Organization | Rules Lobbied On | Pressure Campaigns | Percent (Campaigns /Rules) | Comments | Average per Campaign |
|---|---|---|---|---|---|
| NRDC | 530 | 62 | 11.7% | 5,939,264 | 95795 |
| Sierra Club | 591 | 110 | 18.6% | 5,111,922 | 46472 |
| CREDO | 90 | 41 | 45.6% | 3,019,150 | 73638 |
| Environmental Defense Fund | 111 | 31 | 27.9% | 2,849,517 | 91920 |
| Center For Biological Diversity | 572 | 86 | 15.0% | 2,815,509 | 32738 |
| Earthjustice | 235 | 59 | 25.1% | 2,080,583 | 35264 |
Figure 3: Iteratively cluster documents with repeated text
Figure 4: Identifying Coalitions by the Percent of Matching Text in a Sample of Public Comments using a 10-gram Window
Figure 5: Most Comments Result from Public Pressure Campaigns, 2005-2020
Preprocessing tips: Digitizing allows humans to paste text exactly matching machine-read strings. Summaries (e.g., textrank’s top 3 sentences) speed hand-coding.
Figure 6: Lobbying Success by Number of Supportive Comments
Public pressure to address climate change and environmental justice movements had large effects on policy documents, but a small number of national advocacy organizations dominate lobbying coalitions. When tribal governments or local groups lobby without the support of national advocacy groups, policymakers typically ignore them.
linkit, fastlink, ML with hand-coded training set)